Code
exped <- read_csv("data/exped_tidy.csv")
peaks <- read_csv("data/peaks_tidy.csv")
colnames(peaks)
head(peaks)
colnames(exped)
head(exped)
table(exped$SUCCESS1, useNA = "ifany")INFO 526 Final Project Proposal
Trevor Macdonald & Nandakumar Kuthalaraja
School of Information, University of Arizona
https://arizona.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=85be2d25-7854-45bb-8d26-b30f0048692c
High-altitude mountaineering offers a natural laboratory for studying how human, environmental, and logistical factors interact to shape risk and achievement. Using tidy versions of the Himalayan Database, this project leverages R (tidyverse, tidy models) and Quarto to build an interactive, reproducible data-visualisation narrative around two guiding questions:
What factors contribute to the success or failure of a summit?
We knit together expedition, peak, and temporal attributes to explore how season, route choice, peak height, team size, leader nationality, weather windows, and oxygen strategy correlate with summit outcomes. Visual encodings include faceted ridge plots of summit probability across seasons, interactive geographic representations of route popularity versus success.
A multilevel logistic model provides effect-size estimates while accounting for peak-level heterogeneity.
Does an expedition’s funding affect safety outcomes?
Because direct cost metrics are absent, we construct a funding proxy that blends sponsorship type, total members & hired staff, oxygen usage, and expedition duration.
We then relate this composite to safety indicators like member and staff fatalities, high-altitude illnesses, rescue events via scatter-plot matrices, concentration curves, and Poisson regression. Stratified visualizations highlight differential risk patterns between commercial and self-organised teams.
This project looks at data from many Himalayan mountain climbing trips. We use two main files: one has details about each expedition (called exped_tidy.csv), and the other has information about the mountains themselves (peaks_tidy.csv). The expedition data covers hundreds of trips over many years. For each trip, we know things like which mountain was climbed, how many people were in the team, if they used oxygen, if it was a commercial (guided) trip, and whether the team reached the summit or not.
The mountain (peaks) data gives us extra facts, like each mountain’s name, height, and what region it is in. By putting these two datasets together, we can explore how things like team size, the use of oxygen, type of trip, and which mountain was climbed affect the chances of reaching the summit.
Analyze expedition outcomes using both expedition and peak variables. “Success” will be defined based on the `TERMREASON_FACTOR`, with specific values representing a successful summit.
Use factors like – geography, team size, year, and commercial vs non-commercial expeditions
Climbing a Himalayan peak is a big challenge, and not every team that tries will reach the top. Some expeditions succeed, while others turn back before the summit. But what makes the difference? In this project, we want to understand what factors help climbers succeed and what might make them fail.
We have used following factors:
| Explore Team Size |
| Effect of Geography |
| Explore by Year |
| Explore Commercial Factor |
table(exped$SUCCESS1, useNA = "ifany")
exped2 <- exped |>
mutate(
success = ifelse(SUCCESS1 %in% c("Y", "Yes", "yes"), 1,
ifelse(SUCCESS1 %in% c("N", "No", "no"), 0, as.numeric(SUCCESS1))),
peak = as.factor(PEAKID),
team_size = as.numeric(TOTMEMBERS),
oxygen = ifelse(O2USED %in% c("Y", "Yes", "yes"), 1,
ifelse(O2USED %in% c("N", "No", "no"), 0, as.numeric(O2USED))),
season = as.factor(SEASON)
) |>
filter(!is.na(success), !is.na(team_size), !is.na(oxygen))
model <- glm(
success ~ peak + team_size + oxygen + season + YEAR,
data = exped2,
family = "binomial"
)The Overall Success vs Failure Rate out of all expeditions.
exped2 |>
mutate(outcome = ifelse(success == 1, "Success", "Failure")) |>
group_by(outcome) |>
summarize(n = n()) |>
mutate(rate = n / sum(n)) |>
ggplot(aes(x = outcome, y = rate, fill = outcome)) +
geom_col(width = 0.5) +
scale_y_continuous(labels = scales::percent) +
scale_fill_manual(values = c("Success" = "forestgreen", "Failure" = "firebrick")) +
labs(
title = "Overall Expedition Success vs Failure Rate",
x = "Expedition Outcome",
y = "Proportion"
) +
theme_minimal()We will study with following variables:
| Potential Factors to Study Success Rate |
|---|
| Explore Team Size |
| Effect of Geography |
| Explore by Year |
| Explore Commercial Factor |
ggplot(exped2, aes(x = team_size, fill = factor(success))) +
geom_bar(position = "fill") +
scale_x_continuous(
breaks = seq(0, max(exped2$team_size, na.rm = TRUE), by = 10),
expand = expansion(add = c(0, 0))
) +
scale_y_continuous(labels = scales::percent_format()) +
labs(
x = "Team Size",
y = "Proportion",
fill = "Success\n(1 = yes)",
title = "Proportion of Success vs. Failure\nby Team Size"
) +
theme(
axis.text.x = element_text(angle = 90, vjust = 0.5),
plot.title = element_text(hjust = 0.5)
)df_bin <- exped2 |>
group_by(team_size) |>
summarize(rate = mean(success), n = n(), .groups="drop") |>
filter(n > 3)
ggplot(df_bin, aes(x = team_size, y = rate)) +
geom_segment(aes(xend = team_size, y = 0, yend = rate), color = "grey70") +
geom_point(size = 2) +
scale_y_continuous(labels = scales::percent_format()) +
labs(
x = "Team Size",
y = "Success Rate",
title = "Lollipop Chart of Success Rate by Team Size",
caption = "Source: Himalayan Database (2020–2024)"
)These plots suggests a non-linear relationship between team size and summit success:
Very small teams (1–5 climbers) tend to have very high success rates.
Probable cause - agility, lighter logistical needs, and tighter decision-making.
Small-to-medium teams (3–8 climbers) show a noticeable dip in success.
Probable cause - enough complexity to introduce friction but not enough manpower to provide robust support.
Medium-large teams (9–20 climbers) exhibit steadily increasing success rates, peaking around 20 members at or near 100%.
Probable cause - have enough members to spread tasks, establish well-stocked camps without yet suffering the full burden of very large-group logistics.
Very large teams (20+) see a gradual decline in success.
Probable cause - supplies, campsite congestion, decision latency.
peak_success <- exped2 |>
left_join(peaks |> select(PEAKID, PKNAME, HEIGHTM), by = "PEAKID") |>
group_by(PEAKID, PKNAME, HEIGHTM) |>
summarize(success_rate = mean(success == 1, na.rm = TRUE), n = n()) |>
filter(n > 10)
ggplot(peak_success, aes(x = HEIGHTM, y = success_rate, label = PKNAME)) +
geom_point(color = "steelblue", size = 3, alpha = 0.7) +
geom_text(vjust = -0.8, size = 2.5, check_overlap = TRUE) +
labs(
title = "Altitude vs. Success Rate Across Peaks",
x = "Altitude (meters)",
y = "Success Rate"
) + geom_smooth(method = "loess", se = FALSE, color = "darkred")peak_success <- exped2 |>
left_join(peaks |> select(PEAKID, PKNAME, HEIGHTM), by = "PEAKID") |>
group_by(PKNAME, HEIGHTM) |>
summarize(success_rate = mean(success == 1, na.rm = TRUE), n = n()) |>
filter(n > 10)
ggplot(peak_success, aes(x = reorder(PKNAME, HEIGHTM), y = HEIGHTM, color = success_rate, size = n)) +
geom_point(alpha = 0.8) +
scale_color_gradient(low = "red", high = "green") +
labs(
title = "Altitude and Success Rate Across Peaks",
x = "Peak",
y = "Altitude (meters)",
color = "Success Rate",
size = "Number of Expeditions"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 90, hjust = 1)
)Geography, in particular the altitude band of a peak—turns out not to be a simple “the higher you go, the harder it gets” story.
Low-altitude peaks (~6800m, e.g. Ama Dablam) enjoy very high success rates, thanks to gentler slopes, shorter approaches and straightforward rope-fixed “ladder” routes
Mid-altitude peaks (~7100–7300m, like Pumori and Baruntse) dips to the bottom of the curve,mountains are often more technical, less commercialized, with fewer fixed ropes and smaller teams—so even though they’re “lower,” they prove tougher
Higher Peaks (8000m+ like Everest) enjoys very high success rates, as they benefit from massive infrastructure (high-traffic base camps, helicopter support etc.), which more than offsets the physiological challenge of extreme altitude
Overall, a peak’s infrastructure and traffic are at least as critical to success as its height.
yearly_success <- exped2 |>
group_by(YEAR) |>
summarize(
success_rate = mean(success == 1, na.rm = TRUE),
n = n()
) |>
filter(n > 1)
ggplot(yearly_success, aes(x = YEAR, y = success_rate)) +
geom_line(color = "steelblue", size = 1.2) +
geom_point(color = "darkred", size = 2) +
geom_smooth(method = "loess", se = FALSE, color = "forestgreen", linetype = "dashed") +
labs(
title = "Expedition Success Rate by Year (with Trend)",
x = "Year",
y = "Success Rate"
) +
scale_y_continuous(labels = scales::percent) +
theme_minimal()peak_year <- exped2 |>
left_join(peaks |> select(PEAKID, PKNAME), by = "PEAKID") |>
group_by(YEAR, PKNAME) |>
summarize(success_rate = mean(success == 1, na.rm = TRUE), n = n()) |>
filter(n > 3)
# Plot heatmap
ggplot(peak_year, aes(x = YEAR, y = reorder(PKNAME, -success_rate), fill = success_rate)) +
geom_tile(color = "white") +
scale_fill_gradient(low = "red", high = "green") +
labs(
title = "Success Rate by Year and Peak",
x = "Year",
y = "Peak",
fill = "Success Rate"
) +
theme_minimal()These plots suggests, despite a pandemic-driven slump in 2021, overall expedition success has rebounded strongly by 2024
2020 average success rate ~ 72%
Probable cause – pre-pandemic normalcy, full staffing, established logistics.
2021-2022 average success rate fell to ~ 63%
Probable cause – COVID travel restrictions & teams adapted to new health protocols and smaller expedition sizes.
2023-2024 average success rate rebound to ~ 80%
Probable cause – full operational recovery, improved high-altitude gear, accumulated guiding experience.
exped2 <- exped2 |>
mutate(
commercial = case_when(
COMRTE == TRUE ~ "Commercial",
COMRTE == FALSE ~ "Non-Commercial",
TRUE ~ NA_character_
)
)
comm_year <- exped2 |>
filter(!is.na(commercial)) |>
group_by(YEAR, commercial) |>
summarize(success_rate = mean(success == 1, na.rm = TRUE), n = n()) |>
filter(n > 3)
ggplot(comm_year, aes(x = YEAR, y = success_rate, color = commercial)) +
geom_line(size = 1.1) +
geom_point() +
scale_color_manual(values = c("Commercial" = "skyblue", "Non-Commercial" = "gray40")) +
scale_y_continuous(labels = scales::percent) +
labs(
title = "Success Rate by Year: Commercial vs Non-Commercial",
x = "Year",
y = "Success Rate",
color = "Expedition Type"
) +
theme_minimal()exped2 |>
filter(!is.na(commercial)) |>
left_join(peaks |> select(PEAKID, PKNAME), by = "PEAKID") |>
group_by(PKNAME, commercial) |>
summarize(success_rate = mean(success == 1, na.rm = TRUE), n = n()) |>
filter(n > 5) |>
ggplot(aes(x = reorder(PKNAME, -success_rate), y = success_rate, fill = commercial)) +
geom_col(position = "dodge") +
coord_flip() +
scale_fill_manual(values = c("Commercial" = "skyblue", "Non-Commercial" = "gray70")) +
labs(
title = "Success Rate by Peak and Expedition Type",
x = "Peak",
y = "Success Rate",
fill = "Type"
) +
theme_minimal()exped2 |>
filter(!is.na(commercial)) |>
group_by(team_size = TOTMEMBERS, commercial) |>
summarize(success_rate = mean(success == 1, na.rm = TRUE), n = n()) |>
filter(n > 3, !is.na(team_size)) |>
ggplot(aes(x = team_size, y = success_rate, color = commercial)) +
geom_point(size = 2, alpha = 0.7) +
geom_smooth(se = FALSE) +
scale_color_manual(values = c("Commercial" = "skyblue", "Non-Commercial" = "gray70")) +
scale_y_continuous(labels = scales::percent) +
labs(
title = "Team Size vs Success Rate by Expedition Type",
x = "Team Size",
y = "Success Rate",
color = "Expedition Type"
) +
theme_minimal()These plots suggests, A professional, well-resourced support structure not only accelerates recovery from external shocks (like a pandemic) but also delivers uniformly higher success rates—regardless of mountain or team size.
Commercial expeditions rebound faster and to a higher plateau
In 2021, both types dipped (COVID impact), but commercial climbs dropped only to ~67%, whereas non-commercial fell to ~53%. By 2024, commercial success surged to ~95 %, while non-commercial languished around ~64%.
Across individual peaks, company-run trips consistently outshine independent ones
On major peaks (Everest, Lhotse, Ama Dablam), commercial success rates hover in the 85–95 % range. Non-commercial attempts on the same mountains are 5–10 % lower.
Team-size effects differ by expedition type
Commercial: success stays high (75–100 %) across all team sizes, peaking around 20 members.
Non-commercial: very small groups (> 80 %) and very large groups (~75 %) do better, but mid-sized teams (5–10 climbers) dip down near 50 %
The analysis will be very similar for this question. I suspect there will be some overlap, funding and success rate etc. We will use many of the same variables, but more specifically, FUNDING_TYPE.
We will Compare specific funding types like commercial vs. independent expeditions across countries for more insights
NOTE: Add “real world” examples of expedition success or failure to make analysis relatable.
# Lean tibble of variables of interest
exped_q2 <- exped |>
select(
YEAR, # Year of the expedition
SPONSOR, # Sponsor or funding organization
SUCCESS1, # Success on primary route
SUCCESS2, # Success on second route
SUCCESS3, # Success on third route
SUCCESS4, # Success on fourth route
TERMDATE, # Date expedition was terminated
TERMREASON, # Numeric code for termination reason
TERMREASON_FACTOR, # Description of termination reason
TOTMEMBERS, # Total members in the team
MDEATHS, # Number of member deaths
HDEATHS, # Hired staff deaths
OTHERSMTS, # Other summits during expedition
ACCIDENTS, # Description of accidents
AGENCY, # Trekking/expedition agency
O2USED #
) |>
mutate(
ANY_SUCCESS = SUCCESS1 | SUCCESS2 | SUCCESS3 | SUCCESS4,
FATALITIES = (MDEATHS + HDEATHS > 0),
OUTCOME_LABEL = case_when(
ANY_SUCCESS & !FATALITIES ~ "Success, No Deaths",
ANY_SUCCESS & FATALITIES ~ "Success + Deaths",
!ANY_SUCCESS & FATALITIES ~ "Failure + Deaths",
TRUE ~ "Failure, No Deaths"
)
) |>
select(-SUCCESS1, -SUCCESS2, -SUCCESS3, -SUCCESS4, -MDEATHS, -HDEATHS)
exped_q2 <- exped_q2 |>
mutate(
TERMINATION_TYPE = case_when(
TERMREASON %in% c(1, 2, 3) ~ "Successful",
TERMREASON %in% c(4, 5) ~ "Bad Weather/Conditions",
TERMREASON %in% c(6, 7) ~ "Accident/Illness/Exhaustion",
TERMREASON %in% c(8, 9, 10) ~ "Logistics/Technical",
TERMREASON %in% c(11, 12, 13) ~ "No Attempt/Base Only",
TERMREASON == 14 ~ "Other",
TRUE ~ "Unknown"
)
)library(stringr)
# Source: https://stackoverflow.com/questions/59082243/multiple-patterns-for-string-matching-using-case-when
sponsor_prefix_tbl <- exped_q2 |>
mutate(
SPONSOR_CLEAN = str_to_lower(str_trim(SPONSOR)),
SPONSOR_CLEAN = str_remove(SPONSOR_CLEAN, "\\s*\\b20\\d{2}\\b$"),
SPONSOR_PREFIX = str_extract(SPONSOR_CLEAN, "^\\w+\\s*\\w*")
) |>
count(SPONSOR_PREFIX, sort = TRUE)
#Sponsors <- exped_q2 |>
#distinct(SPONSOR_CLEAN) #|>
#count()
exped_q2 <- exped_q2 |>
mutate(
SPONSOR_CLEAN = str_to_lower(str_trim(SPONSOR)),
SPONSOR_CLEAN = str_remove(SPONSOR_CLEAN, "\\s*\\b20\\d{2}\\b$"),
SPONSOR_PREFIX = str_extract(SPONSOR_CLEAN, "^\\w+\\s*\\w*"),
FUNDING_TYPE = case_when(
str_detect(SPONSOR_PREFIX, "army|police|military|defense") ~ "Military",
str_detect(SPONSOR_PREFIX, "university|college|school|club|jac|univ") ~ "Academic/Alpine Club",
str_detect(SPONSOR_PREFIX, "guides|treks|adventure|travel|ascent|trip|climbalaya|madison|imagine|kobler|makalu|highland|satori|elite|glacier|dream|thamserku") ~ "Commercial",
str_detect(SPONSOR_PREFIX, "private|individual|self|solo|1st|friends|father|jon|jost|alex|kilian|marc|kishori|soren") ~ "Private/Individual",
str_detect(SPONSOR_PREFIX, "national geographic|japanese|korean|slovakian|french|chinese|indian|nepalese|russian|german|austrian|canadian|italian|spanish|swiss|american|latvian") ~ "National Program",
is.na(SPONSOR_PREFIX) | SPONSOR_PREFIX == "" ~ "Other/Unknown",
TRUE ~ "Other/Unknown"
)
)|>
mutate(
FUNDING_SIMPLIFIED = case_when(
FUNDING_TYPE == "Commercial" ~ "Commercial",
FUNDING_TYPE %in% c("Academic/Alpine Club", "National Program", "Military") ~ "Non-commercial",
FUNDING_TYPE %in% c("Private/Individual", "Other/Unknown") ~ "Independent/Other"
)
)
#glimpse(exped_q2)Summary Statistics on Safety
| Expedition Outcomes by Funding Type (2020–2024) | |||||
|---|---|---|---|---|---|
| Funding Type | Total Expeditions | Number Successful | Number with Fatalities | Success Rate | Fatality Rate |
| Commercial | 274 | 212 | 8 | 77.4% | 2.9% |
| Independent/Other | 517 | 365 | 27 | 70.6% | 5.2% |
| Non-commercial | 91 | 51 | 3 | 56.0% | 3.3% |
Commercial expeditions appear safer
Commercial climbs are often assumed to be safer due to professional guides, better logistics, and more resources. The summary table supports this: commercial teams have a slightly higher success rate (77.4% vs. 70.6% for independents), but the fatality rate among independent expeditions is nearly double (5.2% vs. 2.9%).
Independent teams are more numerous, but face more risk
Despite their prevalence (over 500 expeditions), independent teams experience lower success and higher fatality rates. This raises important questions about how funding structure influences safety outcomes.
This motivates a deeper dive into funding and risk
The following analysis breaks expeditions into three groups Commercial, Independent, and Other and examines termination reasons, outcomes, team size distributions, and year-over-year trends to explore the relationship between funding and expedition safety.
Commercial expeditions are far more likely to end successfully
Over 75% of commercial expeditions ended in success, compared to around 65% for independent teams and closer to 60% for the “Other” group. Independent and other-funded expeditions show higher rates of failure, including both bad weather and medical/logistical setbacks.
Termination reasons differ subtly by funding structure
Commercial expeditions rarely terminate due to logistical or medical issues. Most reach the summit or are turned back by weather. Independent and other expeditions show a higher share of terminations due to accidents, illness, or exhaustion.
Deaths occur disproportionately on failed expeditions
In the outcome chart, “Success + Deaths” remains a small slice across all groups. The vast majority of fatal expeditions failed to summit, especially in the independent and other categories reinforcing that success correlates with safety.
Independent expeditions consistently show the highest fatality risk
From 2021 to 2024, independent teams reported the highest fatality rate every year rising steadily from ~4% to over 8%. This trend held across both plots, suggesting the risk isn’t a factor of sample size.
Commercial expeditions maintain lower risk and fewer deaths overall
The line plots show commercial expeditions peaking in 2023 with just ~3 fatalities, while independent teams saw nearly 9 in the same year more than double any other group. Their fatality rate remained under 5% throughout the entire period.
“Other” funded expeditions are low but volatile
The “Other” group stayed below 2 deaths annually, but their fatality rate fluctuated wildly peaking above 8% in 2023, then dropping back to zero by 2024. This variability likely reflects small sample sizes and inconsistent structure.
Taken together, funding appears to shape safety outcomes
Independent expeditions show consistently worse outcomes in both rates and counts. The data suggests commercial expeditions may offer better safety due to greater resources, experienced guides, and more structured decision-making.
Next question: does team size make a difference? You’d think larger teams might be safer due to support and redundancy, but also possibly riskier if they’re less experienced or move slowly. This scatter plot checks whether total team size correlates with success, failure, or fatalities and whether that varies by funding type.
Fatal events concentrate in the middle, not extremes
Fatalities rarely occurred among the smallest or largest teams. For both commercial and independent groups, the 5–15 range saw the most deaths, hinting that “medium” teams may face unique vulnerabilities—perhaps not small enough to retreat quickly, nor large enough for redundancy.
Team size as a proxy for resources in commercial climbs
Large commercial groups often come with oxygen, Sherpa support, and fixed lines. Smaller independent teams may lack this safety net and push harder routes with more risk exposure.
---
title: "A Statistical Exploration of Himalayan Expeditions"
subtitle: "INFO 526 Final Project Proposal"
author:
- name: "Trevor Macdonald & Nandakumar Kuthalaraja"
affiliations:
- name: "School of Information, University of Arizona"
format:
html:
code-tools: true
code-overflow: wrap
embed-resources: true
editor: visual
execute:
warning: false
echo: false
---
```{r}
#| label: load-packages
#| include: false
#| echo: false
# Load packages here
pacman::p_load(tidymodels,
tidyverse, patchwork, scales, ggplot2)
```
```{r}
#| label: setup
#| include: false
#| echo: false
ggplot2::theme_set(ggplot2::theme_minimal(base_size = 11))
knitr::opts_chunk$set(
fig.retina = 3,
dpi = 300,
fig.width = 6,
fig.asp = 0.618
)
```
## Presentation Video
<https://arizona.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=85be2d25-7854-45bb-8d26-b30f0048692c>
## Abstract
High-altitude mountaineering offers a natural laboratory for studying how human, environmental, and logistical factors interact to shape risk and achievement. Using tidy versions of the Himalayan Database, this project leverages R (tidyverse, tidy models) and Quarto to build an interactive, reproducible data-visualisation narrative around two guiding questions:
- **What factors contribute to the success or failure of a summit?**
- We knit together expedition, peak, and temporal attributes to explore how season, route choice, peak height, team size, leader nationality, weather windows, and oxygen strategy correlate with summit outcomes. Visual encodings include faceted ridge plots of summit probability across seasons, interactive geographic representations of route popularity versus success.
- A multilevel logistic model provides effect-size estimates while accounting for peak-level heterogeneity.
- **Does an expedition’s funding affect safety outcomes?**
- Because direct cost metrics are absent, we construct a funding proxy that blends sponsorship type, total members & hired staff, oxygen usage, and expedition duration.
- We then relate this composite to safety indicators like member and staff fatalities, high-altitude illnesses, rescue events via scatter-plot matrices, concentration curves, and Poisson regression. Stratified visualizations highlight differential risk patterns between commercial and self-organised teams.
## Introduction
This project looks at data from many Himalayan mountain climbing trips. We use two main files: one has details about each expedition (called `exped_tidy.csv`), and the other has information about the mountains themselves (`peaks_tidy.csv`). The expedition data covers hundreds of trips over many years. For each trip, we know things like which mountain was climbed, how many people were in the team, if they used oxygen, if it was a commercial (guided) trip, and whether the team reached the summit or not.
The mountain (peaks) data gives us extra facts, like each mountain’s name, height, and what region it is in. By putting these two datasets together, we can explore how things like team size, the use of oxygen, type of trip, and which mountain was climbed affect the chances of reaching the summit.
## Question 1: What factors contribute to the success or failure of a summit?
- Analyze expedition outcomes using both expedition and peak variables. "Success" will be defined based on the \`TERMREASON_FACTOR\`, with specific values representing a successful summit.
- Use factors like -- geography, team size, year, and commercial vs non-commercial expeditions
### Introduction
Climbing a Himalayan peak is a big challenge, and not every team that tries will reach the top. Some expeditions succeed, while others turn back before the summit. But what makes the difference? In this project, we want to understand what factors help climbers succeed and what might make them fail.
We have used following factors:
| |
|---------------------------|
| Explore Team Size |
| Effect of Geography |
| Explore by Year |
| Explore Commercial Factor |
### Approach
- Get Data from the CSV files
```{r}
#| code-fold: true
#| code-summary: "Code"
#| echo: true
#| results: hide
#| message: false
#| warning: false
exped <- read_csv("data/exped_tidy.csv")
peaks <- read_csv("data/peaks_tidy.csv")
colnames(peaks)
head(peaks)
colnames(exped)
head(exped)
table(exped$SUCCESS1, useNA = "ifany")
```
- Build Model & Variable Inferences
```{r}
#| code-fold: true
#| code-summary: "Code"
#| echo: true
#| results: hide
#| message: false
#| warning: false
table(exped$SUCCESS1, useNA = "ifany")
exped2 <- exped |>
mutate(
success = ifelse(SUCCESS1 %in% c("Y", "Yes", "yes"), 1,
ifelse(SUCCESS1 %in% c("N", "No", "no"), 0, as.numeric(SUCCESS1))),
peak = as.factor(PEAKID),
team_size = as.numeric(TOTMEMBERS),
oxygen = ifelse(O2USED %in% c("Y", "Yes", "yes"), 1,
ifelse(O2USED %in% c("N", "No", "no"), 0, as.numeric(O2USED))),
season = as.factor(SEASON)
) |>
filter(!is.na(success), !is.na(team_size), !is.na(oxygen))
model <- glm(
success ~ peak + team_size + oxygen + season + YEAR,
data = exped2,
family = "binomial"
)
```
- Start plotting various charts building up to the case
### Start
::::: columns
::: {.column width="55%"}
The Overall Success vs Failure Rate out of all expeditions.
```{r}
#| code-fold: true
#| code-summary: "Code"
#| echo: true
#| results: hide
#| message: false
#| warning: false
exped2 |>
mutate(outcome = ifelse(success == 1, "Success", "Failure")) |>
group_by(outcome) |>
summarize(n = n()) |>
mutate(rate = n / sum(n)) |>
ggplot(aes(x = outcome, y = rate, fill = outcome)) +
geom_col(width = 0.5) +
scale_y_continuous(labels = scales::percent) +
scale_fill_manual(values = c("Success" = "forestgreen", "Failure" = "firebrick")) +
labs(
title = "Overall Expedition Success vs Failure Rate",
x = "Expedition Outcome",
y = "Proportion"
) +
theme_minimal()
```
:::
::: {.column width="45%"}
We will study with following variables:
| Potential Factors to Study Success Rate |
|-----------------------------------------|
| Explore Team Size |
| Effect of Geography |
| Explore by Year |
| Explore Commercial Factor |
:::
:::::
### Explore Team Size
::::: columns
::: {.column width="50%"}
```{r}
#| code-fold: true
#| code-summary: "Code"
#| echo: true
#| results: hide
#| message: false
#| warning: false
#|
ggplot(exped2, aes(team_size)) +
geom_histogram(binwidth = 1, fill = "steelblue", color = "white") +
facet_wrap(~ success, labeller = labeller(success = c(`0` = "Failure", `1` = "Success"))) +
labs(
x = "Team Size",
y = "Count",
title = "Distribution of Team Sizes\nfor Success vs. Failure"
)
```
:::
::: {.column width="50%"}
```{r}
#| code-fold: true
#| code-summary: "Code"
#| echo: true
#| results: hide
#| message: false
#| warning: false
ggplot(exped2, aes(x = team_size, fill = factor(success))) +
geom_bar(position = "fill") +
scale_x_continuous(
breaks = seq(0, max(exped2$team_size, na.rm = TRUE), by = 10),
expand = expansion(add = c(0, 0))
) +
scale_y_continuous(labels = scales::percent_format()) +
labs(
x = "Team Size",
y = "Proportion",
fill = "Success\n(1 = yes)",
title = "Proportion of Success vs. Failure\nby Team Size"
) +
theme(
axis.text.x = element_text(angle = 90, vjust = 0.5),
plot.title = element_text(hjust = 0.5)
)
```
:::
:::::
:::::: columns
::: {.column width="25%"}
:::
::: {.column width="50%" style="\"min-height:100px;"}
```{r, fig.height=3}
#| code-fold: true
#| code-summary: "Code"
#| echo: true
#| results: hide
#| message: false
#| warning: false
df_bin <- exped2 |>
group_by(team_size) |>
summarize(rate = mean(success), n = n(), .groups="drop") |>
filter(n > 3)
ggplot(df_bin, aes(x = team_size, y = rate)) +
geom_segment(aes(xend = team_size, y = 0, yend = rate), color = "grey70") +
geom_point(size = 2) +
scale_y_continuous(labels = scales::percent_format()) +
labs(
x = "Team Size",
y = "Success Rate",
title = "Lollipop Chart of Success Rate by Team Size",
caption = "Source: Himalayan Database (2020–2024)"
)
```
:::
::: {.column width="25%"}
<!-- blank right spacer -->
:::
::::::
### Explore Team Size Analysis
::: {style="font-size:0.8em; line-height:1.2;"}
`These plots suggests a non-linear relationship between team size and summit success:`
- **`Very small teams (1–5 climbers)`** tend to have very high success rates.
Probable cause - agility, lighter logistical needs, and tighter decision-making.
- **`Small-to-medium teams (3–8 climbers)`** show a noticeable dip in success.
Probable cause - enough complexity to introduce friction but not enough manpower to provide robust support.
- **`Medium-large teams (9–20 climbers)`** exhibit steadily increasing success rates, peaking around 20 members at or near 100%.
Probable cause - have enough members to spread tasks, establish well-stocked camps without yet suffering the full burden of very large-group logistics.
- **`Very large teams (20+)`** see a gradual decline in success.
Probable cause - supplies, campsite congestion, decision latency.
:::
### Effect of Geography
::::: columns
::: {.column width="50%"}
```{r}
#| code-fold: true
#| code-summary: "Code"
#| echo: true
#| results: hide
#| message: false
#| warning: false
peak_success <- exped2 |>
left_join(peaks |> select(PEAKID, PKNAME, HEIGHTM), by = "PEAKID") |>
group_by(PEAKID, PKNAME, HEIGHTM) |>
summarize(success_rate = mean(success == 1, na.rm = TRUE), n = n()) |>
filter(n > 10)
ggplot(peak_success, aes(x = HEIGHTM, y = success_rate, label = PKNAME)) +
geom_point(color = "steelblue", size = 3, alpha = 0.7) +
geom_text(vjust = -0.8, size = 2.5, check_overlap = TRUE) +
labs(
title = "Altitude vs. Success Rate Across Peaks",
x = "Altitude (meters)",
y = "Success Rate"
) + geom_smooth(method = "loess", se = FALSE, color = "darkred")
```
:::
::: {.column width="50%"}
```{r}
#| code-fold: true
#| code-summary: "Code"
#| echo: true
#| results: hide
#| message: false
#| warning: false
peak_success <- exped2 |>
left_join(peaks |> select(PEAKID, PKNAME, HEIGHTM), by = "PEAKID") |>
group_by(PKNAME, HEIGHTM) |>
summarize(success_rate = mean(success == 1, na.rm = TRUE), n = n()) |>
filter(n > 10)
ggplot(peak_success, aes(x = reorder(PKNAME, HEIGHTM), y = HEIGHTM, color = success_rate, size = n)) +
geom_point(alpha = 0.8) +
scale_color_gradient(low = "red", high = "green") +
labs(
title = "Altitude and Success Rate Across Peaks",
x = "Peak",
y = "Altitude (meters)",
color = "Success Rate",
size = "Number of Expeditions"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 90, hjust = 1)
)
```
:::
:::::
:::::: columns
::: {.column width="25%"}
:::
::: {.column width="50%" style="\"min-height:60px;"}
```{r, fig.height=3}
```
:::
::: {.column width="25%"}
<!-- blank right spacer -->
:::
::::::
### Effect of Geography Analysis
::: {style="font-size:0.8em; line-height:1.2;"}
`Geography, in particular the altitude band of a peak—turns out not to be a simple “the higher you go, the harder it gets” story.`
- **`Low-altitude peaks (~6800m, e.g. Ama Dablam)`** enjoy very high success rates, thanks to gentler slopes, shorter approaches and straightforward rope-fixed “ladder” routes
- **`Mid-altitude peaks (~7100–7300m, like Pumori and Baruntse)`** dips to the bottom of the curve,mountains are often more technical, less commercialized, with fewer fixed ropes and smaller teams—so even though they’re “lower,” they prove tougher
- **`Higher Peaks (8000m+ like Everest)`** enjoys very high success rates, as they benefit from massive infrastructure (high-traffic base camps, helicopter support etc.), which more than offsets the physiological challenge of extreme altitude
Overall, a peak’s infrastructure and traffic are at least as critical to success as its height.
:::
### Explore by Year
::::: columns
::: {.column width="50%"}
```{r}
#| code-fold: true
#| code-summary: "Code"
#| echo: true
#| results: hide
#| message: false
#| warning: false
yearly_success <- exped2 |>
group_by(YEAR) |>
summarize(
success_rate = mean(success == 1, na.rm = TRUE),
n = n()
) |>
filter(n > 1)
ggplot(yearly_success, aes(x = YEAR, y = success_rate)) +
geom_line(color = "steelblue", size = 1.2) +
geom_point(color = "darkred", size = 2) +
geom_smooth(method = "loess", se = FALSE, color = "forestgreen", linetype = "dashed") +
labs(
title = "Expedition Success Rate by Year (with Trend)",
x = "Year",
y = "Success Rate"
) +
scale_y_continuous(labels = scales::percent) +
theme_minimal()
```
:::
::: {.column width="50%"}
```{r}
#| code-fold: true
#| code-summary: "Code"
#| echo: true
#| results: hide
#| message: false
#| warning: false
peak_year <- exped2 |>
left_join(peaks |> select(PEAKID, PKNAME), by = "PEAKID") |>
group_by(YEAR, PKNAME) |>
summarize(success_rate = mean(success == 1, na.rm = TRUE), n = n()) |>
filter(n > 3)
# Plot heatmap
ggplot(peak_year, aes(x = YEAR, y = reorder(PKNAME, -success_rate), fill = success_rate)) +
geom_tile(color = "white") +
scale_fill_gradient(low = "red", high = "green") +
labs(
title = "Success Rate by Year and Peak",
x = "Year",
y = "Peak",
fill = "Success Rate"
) +
theme_minimal()
```
:::
:::::
:::::: columns
::: {.column width="25%"}
:::
::: {.column width="50%" style="\"min-height:100px;"}
```{r, fig.height=3}
```
:::
::: {.column width="25%"}
<!-- blank right spacer -->
:::
::::::
### Explore by Year Analysis
::: {style="font-size:0.8em; line-height:1.2;"}
`These plots suggests, despite a pandemic-driven slump in 2021, overall expedition success has rebounded strongly by 2024`
- **`2020`** average success rate \~ 72%
Probable cause – pre-pandemic normalcy, full staffing, established logistics.
- **`2021-2022`** average success rate fell to \~ 63%
Probable cause – COVID travel restrictions & teams adapted to new health protocols and smaller expedition sizes.
- **`2023-2024`** average success rate rebound to \~ 80%
Probable cause – full operational recovery, improved high-altitude gear, accumulated guiding experience.
:::
### Commercial Factor
::::: columns
::: {.column width="50%"}
```{r}
#| code-fold: true
#| code-summary: "Code"
#| echo: true
#| results: hide
#| message: false
#| warning: false
exped2 <- exped2 |>
mutate(
commercial = case_when(
COMRTE == TRUE ~ "Commercial",
COMRTE == FALSE ~ "Non-Commercial",
TRUE ~ NA_character_
)
)
comm_year <- exped2 |>
filter(!is.na(commercial)) |>
group_by(YEAR, commercial) |>
summarize(success_rate = mean(success == 1, na.rm = TRUE), n = n()) |>
filter(n > 3)
ggplot(comm_year, aes(x = YEAR, y = success_rate, color = commercial)) +
geom_line(size = 1.1) +
geom_point() +
scale_color_manual(values = c("Commercial" = "skyblue", "Non-Commercial" = "gray40")) +
scale_y_continuous(labels = scales::percent) +
labs(
title = "Success Rate by Year: Commercial vs Non-Commercial",
x = "Year",
y = "Success Rate",
color = "Expedition Type"
) +
theme_minimal()
```
:::
::: {.column width="50%"}
```{r}
#| code-fold: true
#| code-summary: "Code"
#| echo: true
#| results: hide
#| message: false
#| warning: false
exped2 |>
filter(!is.na(commercial)) |>
left_join(peaks |> select(PEAKID, PKNAME), by = "PEAKID") |>
group_by(PKNAME, commercial) |>
summarize(success_rate = mean(success == 1, na.rm = TRUE), n = n()) |>
filter(n > 5) |>
ggplot(aes(x = reorder(PKNAME, -success_rate), y = success_rate, fill = commercial)) +
geom_col(position = "dodge") +
coord_flip() +
scale_fill_manual(values = c("Commercial" = "skyblue", "Non-Commercial" = "gray70")) +
labs(
title = "Success Rate by Peak and Expedition Type",
x = "Peak",
y = "Success Rate",
fill = "Type"
) +
theme_minimal()
```
:::
:::::
:::::: columns
::: {.column width="25%"}
:::
::: {.column width="50%" style="\"min-height:100px;"}
```{r, fig.height=3}
#| code-fold: true
#| code-summary: "Code"
#| echo: true
#| results: hide
#| message: false
#| warning: false
exped2 |>
filter(!is.na(commercial)) |>
group_by(team_size = TOTMEMBERS, commercial) |>
summarize(success_rate = mean(success == 1, na.rm = TRUE), n = n()) |>
filter(n > 3, !is.na(team_size)) |>
ggplot(aes(x = team_size, y = success_rate, color = commercial)) +
geom_point(size = 2, alpha = 0.7) +
geom_smooth(se = FALSE) +
scale_color_manual(values = c("Commercial" = "skyblue", "Non-Commercial" = "gray70")) +
scale_y_continuous(labels = scales::percent) +
labs(
title = "Team Size vs Success Rate by Expedition Type",
x = "Team Size",
y = "Success Rate",
color = "Expedition Type"
) +
theme_minimal()
```
:::
::: {.column width="25%"}
<!-- blank right spacer -->
:::
::::::
### Commercial Factor Analysis
::: {style="font-size:0.8em; line-height:1.2;"}
`These plots suggests, A professional, well-resourced support structure not only accelerates recovery from external shocks (like a pandemic) but also delivers uniformly higher success rates—regardless of mountain or team size.`
- **`Commercial expeditions rebound faster and to a higher plateau`**
In 2021, both types dipped (COVID impact), but commercial climbs dropped only to \~67%, whereas non-commercial fell to \~53%. By 2024, commercial success surged to \~95 %, while non-commercial languished around \~64%.
- **`Across individual peaks, company-run trips consistently outshine independent ones`**
On major peaks (Everest, Lhotse, Ama Dablam), commercial success rates hover in the 85–95 % range. Non-commercial attempts on the same mountains are 5–10 % lower.
- **`Team-size effects differ by expedition type`**
**Commercial**: success stays high (75–100 %) across all team sizes, peaking around 20 members.\
**Non-commercial**: very small groups (\> 80 %) and very large groups (\~75 %) do better, but mid-sized teams (5–10 climbers) dip down near 50 %
:::
## Question 2
The analysis will be very similar for this question. I suspect there will be some overlap, funding and success rate etc. We will use many of the same variables, but more specifically, FUNDING_TYPE.
We will Compare specific funding types like commercial vs. independent expeditions across countries for more insights
NOTE: Add "real world" examples of expedition success or failure to make analysis relatable.
### Approach
- Get Data from the CSV files
```{r}
#| code-fold: true
#| code-summary: "Code"
#| echo: true
#| results: hide
#| message: false
#| warning: false
exped <- read_csv("data/exped_tidy.csv")
```
- Data Wrangling & Variable Inferences
```{r}
#| code-fold: true
#| code-summary: "Code"
#| echo: true
#| results: hide
#| message: false
#| warning: false
# Lean tibble of variables of interest
exped_q2 <- exped |>
select(
YEAR, # Year of the expedition
SPONSOR, # Sponsor or funding organization
SUCCESS1, # Success on primary route
SUCCESS2, # Success on second route
SUCCESS3, # Success on third route
SUCCESS4, # Success on fourth route
TERMDATE, # Date expedition was terminated
TERMREASON, # Numeric code for termination reason
TERMREASON_FACTOR, # Description of termination reason
TOTMEMBERS, # Total members in the team
MDEATHS, # Number of member deaths
HDEATHS, # Hired staff deaths
OTHERSMTS, # Other summits during expedition
ACCIDENTS, # Description of accidents
AGENCY, # Trekking/expedition agency
O2USED #
) |>
mutate(
ANY_SUCCESS = SUCCESS1 | SUCCESS2 | SUCCESS3 | SUCCESS4,
FATALITIES = (MDEATHS + HDEATHS > 0),
OUTCOME_LABEL = case_when(
ANY_SUCCESS & !FATALITIES ~ "Success, No Deaths",
ANY_SUCCESS & FATALITIES ~ "Success + Deaths",
!ANY_SUCCESS & FATALITIES ~ "Failure + Deaths",
TRUE ~ "Failure, No Deaths"
)
) |>
select(-SUCCESS1, -SUCCESS2, -SUCCESS3, -SUCCESS4, -MDEATHS, -HDEATHS)
exped_q2 <- exped_q2 |>
mutate(
TERMINATION_TYPE = case_when(
TERMREASON %in% c(1, 2, 3) ~ "Successful",
TERMREASON %in% c(4, 5) ~ "Bad Weather/Conditions",
TERMREASON %in% c(6, 7) ~ "Accident/Illness/Exhaustion",
TERMREASON %in% c(8, 9, 10) ~ "Logistics/Technical",
TERMREASON %in% c(11, 12, 13) ~ "No Attempt/Base Only",
TERMREASON == 14 ~ "Other",
TRUE ~ "Unknown"
)
)
```
```{r}
#| code-fold: true
#| code-summary: "Code"
#| echo: true
#| results: hide
#| message: false
#| warning: false
library(stringr)
# Source: https://stackoverflow.com/questions/59082243/multiple-patterns-for-string-matching-using-case-when
sponsor_prefix_tbl <- exped_q2 |>
mutate(
SPONSOR_CLEAN = str_to_lower(str_trim(SPONSOR)),
SPONSOR_CLEAN = str_remove(SPONSOR_CLEAN, "\\s*\\b20\\d{2}\\b$"),
SPONSOR_PREFIX = str_extract(SPONSOR_CLEAN, "^\\w+\\s*\\w*")
) |>
count(SPONSOR_PREFIX, sort = TRUE)
#Sponsors <- exped_q2 |>
#distinct(SPONSOR_CLEAN) #|>
#count()
exped_q2 <- exped_q2 |>
mutate(
SPONSOR_CLEAN = str_to_lower(str_trim(SPONSOR)),
SPONSOR_CLEAN = str_remove(SPONSOR_CLEAN, "\\s*\\b20\\d{2}\\b$"),
SPONSOR_PREFIX = str_extract(SPONSOR_CLEAN, "^\\w+\\s*\\w*"),
FUNDING_TYPE = case_when(
str_detect(SPONSOR_PREFIX, "army|police|military|defense") ~ "Military",
str_detect(SPONSOR_PREFIX, "university|college|school|club|jac|univ") ~ "Academic/Alpine Club",
str_detect(SPONSOR_PREFIX, "guides|treks|adventure|travel|ascent|trip|climbalaya|madison|imagine|kobler|makalu|highland|satori|elite|glacier|dream|thamserku") ~ "Commercial",
str_detect(SPONSOR_PREFIX, "private|individual|self|solo|1st|friends|father|jon|jost|alex|kilian|marc|kishori|soren") ~ "Private/Individual",
str_detect(SPONSOR_PREFIX, "national geographic|japanese|korean|slovakian|french|chinese|indian|nepalese|russian|german|austrian|canadian|italian|spanish|swiss|american|latvian") ~ "National Program",
is.na(SPONSOR_PREFIX) | SPONSOR_PREFIX == "" ~ "Other/Unknown",
TRUE ~ "Other/Unknown"
)
)|>
mutate(
FUNDING_SIMPLIFIED = case_when(
FUNDING_TYPE == "Commercial" ~ "Commercial",
FUNDING_TYPE %in% c("Academic/Alpine Club", "National Program", "Military") ~ "Non-commercial",
FUNDING_TYPE %in% c("Private/Individual", "Other/Unknown") ~ "Independent/Other"
)
)
#glimpse(exped_q2)
```
##
## Question 2 : Motivation
Summary Statistics on Safety
```{r}
library(gt)
# Summarize safety metrics and render as HTML table
exped_q2 |>
group_by(FUNDING_SIMPLIFIED) |>
summarise(
n_expeditions = n(),
n_success = sum(ANY_SUCCESS),
n_fatalities = sum(FATALITIES),
pct_success = mean(ANY_SUCCESS),
pct_fatal = mean(FATALITIES),
.groups = "drop"
) |>
mutate(
pct_success = scales::percent(pct_success, accuracy = 0.1),
pct_fatal = scales::percent(pct_fatal, accuracy = 0.1)
) |>
gt() |>
tab_header(
title = "Expedition Outcomes by Funding Type (2020–2024)"
) |>
cols_label(
FUNDING_SIMPLIFIED = "Funding Type",
n_expeditions = "Total Expeditions",
n_success = "Number Successful",
n_fatalities = "Number with Fatalities",
pct_success = "Success Rate",
pct_fatal = "Fatality Rate"
)
```
::: {style="font-size:0.5em; line-height:1.2;"}
- **`Commercial expeditions appear safer`**
Commercial climbs are often assumed to be safer due to professional guides, better logistics, and more resources. The summary table supports this: commercial teams have a slightly higher success rate (77.4% vs. 70.6% for independents), but the fatality rate among independent expeditions is nearly double (5.2% vs. 2.9%).
- **`Independent teams are more numerous, but face more risk`**
Despite their prevalence (over 500 expeditions), independent teams experience lower success and higher fatality rates. This raises important questions about how funding structure influences safety outcomes.
- **`This motivates a deeper dive into funding and risk`**
The following analysis breaks expeditions into three groups Commercial, Independent, and Other and examines termination reasons, outcomes, team size distributions, and year-over-year trends to explore the relationship between funding and expedition safety.
:::
## Question 2 : How do Expeditions End?
::::: columns
::: {.column width="50%"}
```{r}
#| label: Termination Type
#| message: false
#| warning: false
# Filter
exped_q2 |>
filter(
!is.na(TERMINATION_TYPE),
!TERMINATION_TYPE %in% c("Other", "No Attempt/Base Only","Unknown","Successful")
) |>
ggplot(aes(
x = FUNDING_SIMPLIFIED,
fill = TERMINATION_TYPE
)) +
geom_bar(position = "fill") + # stacked bar shows proportions
scale_fill_brewer(palette = "YlGnBu",direction = 1, name = "Termination reason"
) +
scale_y_continuous(labels = scales::percent) +
labs(
x = NULL,
y = NULL,
fill = "Termination reason of attempted summit",
title = "Termination Reason by Funding Type",
subtitle = "Excludes 'Other' and 'No attempt'",
caption = "Source: Himalayan Database (TidyTuesday 2025)"
) +
theme_minimal()
```
:::
::: {.column width="50%"}
```{r}
#| label: Expedition Outcomes
#| message: false
#| warning: false
ggplot(exped_q2, aes(x = FUNDING_SIMPLIFIED, fill = OUTCOME_LABEL)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = scales::percent) +
scale_fill_brewer(palette = "YlGnBu", direction = 1) +
labs(
title = "Expedition Outcomes by Funding Source",
x = NULL,
y = NULL,
fill = "Outcome"
) +
theme_minimal()
```
:::
:::::
## Question 2 : Expeditions End Analysis
::: {style="font-size:0.6em; line-height:1.2;"}
- **`Commercial expeditions are far more likely to end successfully`**
Over 75% of commercial expeditions ended in success, compared to around 65% for independent teams and closer to 60% for the "Other" group. Independent and other-funded expeditions show higher rates of failure, including both bad weather and medical/logistical setbacks.
- **`Termination reasons differ subtly by funding structure`**
Commercial expeditions rarely terminate due to logistical or medical issues. Most reach the summit or are turned back by weather. Independent and other expeditions show a higher share of terminations due to accidents, illness, or exhaustion.
- **`Deaths occur disproportionately on failed expeditions`**
In the outcome chart, “Success + Deaths” remains a small slice across all groups. The vast majority of fatal expeditions failed to summit, especially in the independent and other categories reinforcing that success correlates with safety.
:::
## Question 2 : Fatality Trend
::::: columns
::: {.column width="50%"}
```{r}
exped_q2 |>
group_by(YEAR, FUNDING_SIMPLIFIED) |>
summarise(
fatality_rate = mean(FATALITIES, na.rm = TRUE),
.groups = "drop"
) |>
ggplot(aes(x = YEAR, y = fatality_rate, color = FUNDING_SIMPLIFIED)) +
geom_line(linewidth = 1) +
geom_point(size = 2.5) +
scale_color_brewer(palette = "YlGnBu", direction = -1, name = "Funding Type") +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
labs(
title = "Yearly Trend Fatalities by Funding Type",
subtitle = "Fatality rate = expeditions with ≥1 member or hired death",
x = "Year",
y = "Fatality Rate (%)",
caption = "Source: Himalayan Database via TidyTuesday (2025-01-21)"
) +
theme_minimal() +
theme(legend.position = "top")
```
:::
::: {.column width="50%"}
```{r}
#| label: Yearly Fatality by Funding Type
#| warning: false
exped_q2 |>
filter(!is.na(FUNDING_SIMPLIFIED), !is.na(YEAR)) |>
group_by(YEAR, FUNDING_SIMPLIFIED) |>
summarise(
total_deaths = sum(FATALITIES, na.rm = TRUE),
.groups = "drop"
) |>
ggplot(aes(x = YEAR, y = total_deaths, color = FUNDING_SIMPLIFIED)) +
geom_line(size = 1) +
geom_point(size = 2) +
scale_color_brewer(palette = "YlGnBu", direction = -1, name = "Funding Type") +
labs(
x = "Year",
y = "Total Member Fatalities",
title = "Yearly Fatality by Funding Type",
subtitle = "Total deaths per expedition 2020–2024",
caption = "Source: Himalayan Database via TidyTuesday (2025-01-21)"
) +
theme(
legend.position = "top"
) +
theme_minimal()
```
:::
:::::
## Question 2 : Fatality Trend Analysis
::: {style="font-size:0.6em; line-height:1.2;"}
- **`Independent expeditions consistently show the highest fatality risk`**\
From 2021 to 2024, independent teams reported the highest fatality rate every year rising steadily from \~4% to over 8%. This trend held across both plots, suggesting the risk isn’t a factor of sample size.
- **`Commercial expeditions maintain lower risk and fewer deaths overall`**\
The line plots show commercial expeditions peaking in 2023 with just \~3 fatalities, while independent teams saw nearly 9 in the same year more than double any other group. Their fatality rate remained under 5% throughout the entire period.
- **`“Other” funded expeditions are low but volatile`**\
The “Other” group stayed below 2 deaths annually, but their fatality rate fluctuated wildly peaking above 8% in 2023, then dropping back to zero by 2024. This variability likely reflects small sample sizes and inconsistent structure.
- **`Taken together, funding appears to shape safety outcomes`**\
Independent expeditions show consistently worse outcomes in both rates and counts. The data suggests commercial expeditions may offer better safety due to greater resources, experienced guides, and more structured decision-making.
:::
## Question 2 : Team Size a Proxy for Safety
::: {style="font-size:0.5em; line-height:1.2;"}
Next question: does team size make a difference? You’d think larger teams might be safer due to support and redundancy, but also possibly riskier if they’re less experienced or move slowly. This scatter plot checks whether total team size correlates with success, failure, or fatalities and whether that varies by funding type.
:::
::::: columns
::: {.column width="50%"}
```{r}
exped_q2 |>
filter(!is.na(TOTMEMBERS), TOTMEMBERS < 25) |>
ggplot(aes(x = TOTMEMBERS)) +
geom_histogram(binwidth = 5, fill = "steelblue", color = "white") +
facet_wrap(~ FUNDING_SIMPLIFIED) +
labs(
x = "Team Size (Total Members) Bin Width = 5",
y = "Number of Expeditions",
title = "Distribution of Team Sizes by Funding Type",
#subtitle = "Each panel shows team size distribution for one funding category (2020–2024)",
caption = "Source: Himalayan Database via TidyTuesday (2025-01-21)"
) +
theme_minimal()
```
:::
::: {.column width="50%"}
```{r}
exped_q2 |>
filter(FATALITIES == TRUE, !is.na(TOTMEMBERS), TOTMEMBERS < 25) |>
ggplot(aes(x = TOTMEMBERS)) +
geom_histogram(
binwidth = 5,
boundary = 0,
fill = "#2c7fb8", # consistent with YlGnBu middle-range blue
color = "white"
) +
facet_wrap(~ FUNDING_SIMPLIFIED) +
labs(
title = "Distribution of Team Sizes for Expeditions with Fatalities",
subtitle = "Faceted by Funding Type, 2020–2024 (Team size < 25)",
x = "Team Size (Total Members) Bin Width = 5",
y = "Number of Fatal Expeditions",
caption = "Source: Himalayan Database via TidyTuesday (2025-01-21)"
) +
theme_minimal()
```
:::
:::::
## Question 2 : Team Size as Proxy for Safety Analysis
::: {style="font-size:0.6em; line-height:1.2;"}
- **`Fatal events concentrate in the middle, not extremes`**\
Fatalities rarely occurred among the smallest or largest teams. For both commercial and independent groups, the 5–15 range saw the most deaths, hinting that “medium” teams may face unique vulnerabilities—perhaps not small enough to retreat quickly, nor large enough for redundancy.
- **`Team size as a proxy for resources in commercial climbs`**\
Large commercial groups often come with oxygen, Sherpa support, and fixed lines. Smaller independent teams may lack this safety net and push harder routes with more risk exposure.
:::
## Key Findings (Q1 – Summit Success)
- **Team Size**
- Very small teams (1–5) & medium-large (9–20) show the highest success\
- Mid-sized groups (3–8) dip, and very large (\>20) see slight declines
- **Peak Geography**
- Mid-altitude (\~7100–7300 m) hardest—technical routes, low infrastructure\
- High-traffic 8000 m peaks benefit from camps, fixed ropes and support
- **Temporal Trend**
- Pre-COVID (2020): \~72% success\
- COVID slump (2021–22): \~63%\
- Full rebound (2023–24): \~80%
- **Commercial vs Non-Commercial**
- Commercial teams recover faster post-COVID & hit \~95% success by 2024\
- Non-commercial plateau near \~64%
------------------------------------------------------------------------
## Key Findings (Q2 – Funding & Safety)
- **Funding Proxy**
- **Commercial** – guided operators & adventure companies\
- **Non-commercial** – academic clubs, national programs, military\
- **Independent/Other** – private, self-organized, unknown
- **Safety Outcomes**
- Commercial expeditions have **slightly lower fatality rates** (\~ 5–8%) vs independent (\~ 10–12%)\
- Accident-related terminations more common in non-commercial & independent groups\
- Weather-driven turn-backs uniform across all funding types
- **Fatality Trend**
- All groups show modest declines in fatality rates over 2020–24\
- **Commercial** maintain the lowest trajectory; **Independent** show the steepest decline
## Wrap Up
1. **Seasonality & route normalization** remain the strongest summit predictors.\
2. **Resource-intensive strategies** (guided support, oxygen deployment) boost success with only **marginal safety gains** once environment is controlled.\
3. **Mid-altitude peaks** present the greatest risk–reward paradox: technically harder but fewer logistics.\
4. **Stakeholders** (climbers, agencies, policymakers) can use these insights to optimize team size, funding models, and risk mitigation plans.